To be secured by United States Letters Patent, what is 
claimed is: 

1. A method of extracting classifying data from an audio 
signal, the method comprising the steps of: 

(a) processing said audio signal into a 
perceptual representation of its constituent 
frequencies; 

(b) processing said perceptual representation 
into at least one learning representation of said 
audio data stream; 

(c) inputting at least one said learning 
representation into a multi-stage classifier, 
whereby said multi-stage classifier extracts 
classifying data from said learning 
representations and outputs the classification of 
said audio signal . 

2* The method of extracting classifying data from an 

audio signal according to claim 1, wherein the step of 
processing the audio data into a perceptual 


representation of its constituent frequencies 
comprises calculating, for a time sample window of a 
digital representation of said audio signal, a Fast 
Fourier Transform function. 

3 . The method of extracting classifying data from an 

audio signal according to claim 1, wherein the step of 
processing said perceptual representation into at 
least one learning representation further comprises 
dividing said perceptual representation into a 
plurality of time slices. 

4. The method of extracting classifying data from an 
audio signal according to claim 3, wherein each of 
said time slices is about 0.8 to about 1.2 seconds in 
length. 

5. The method of extracting classifying data from an 
audio signal according to claim 1, wherein the step of 
dividing the perceptual representation into learning 
representations further comprises dividing said 
perceptual representation into a plurality of 
frequency bands. 
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6. The method of extracting classifying data from an 
audio signal according to claim 5, wherein said 
plurality of frequency bands comprises 20 frequency 
bands . 

7. The method of extracting classifying data from an 
audio signal according to claim 5, wherein the size of 
each of said frequency bands grows according to the 
golden ratio of frequency with respect to pitch. 

8. The method of extracting classifying data from an 
audio data stream according to claim 5, wherein no 
said frequency band includes any frequency greater 
than 11 kHz. 

9. The method of extracting classifying data from an 
audio signal according to claim 1, wherein a first 
stage of said multi-stage classifier comprises at 
least one Support Vector Machine. 
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10 • The method of extracting classifying data from an 
audio signal according to claim 10, wherein said first 
stage of said multi-stage classifier comprises at 
least one Support Vector Machine per category of 
classification. 


11. The method of extracting classifying data from an 
audio signal according to claim 1, wherein a final 
stage of said multi-stage classifier comprises a 
neural network. 


12. The method of extracting classifying data from an 
audio signal according to claim 11, wherein said 
neural network comprises at least one input node per 
category of classification, and further wherein said 
neural net comprises at least one output node per 
category of classification. 

13. The method of extracting classifying data from an 

audio signal according to claim 12, wherein said 

neural network comprises a hidden layer, wherein said 

hidden layer comprises at least as many nodes as the 

number of said input nodes. 
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14 . The method of extracting classifying data from an 
audio signal according to claim 11, wherein said 
neural network operates on a Gaussian activation 
function. 

15. The method of extracting classifying data from an 
audio signal according to claim 1, wherein said 
classifying data comprises at least one of artist and 
genre ♦ 

16. The method of extracting classifying data from an 
audio signal according to claim 1, further comprising 
the step of converting said audio signal into a pulse 
code modulated digital bit stream. 

17. The method of extracting classifying data from an 
audio signal according to claim 1, further comprising 
the step of measuring the confidence of said 
classification by said multi-stage classifier. 
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18. A computer readable storage medium, storing 
therein a program of instructions for causing a 
computer to execute process of extracting classifying 
data from an audio signal, said process comprising the 
steps of: 

(a) processing said audio signal into a 
perceptual representation of its constituent 
frequencies; 

(b) processing said perceptual representation 
into at least one learning representation; 

(c) inputting said learning representations of 
said audio data stream into a multi-stage 
classifier, whereby said multi-stage classifier 
extracts classifying data from said learning 
representations and outputs the classification of 
said audio signal . 

19. A method of representing an audio signal for 
machine learning comprising: 

(a) creating a perceptual representation of said 
audio signal by performing a frequency domain 


transform on at least one time- sampled window of 
a digital representation of said audio signal, 
said perceptual representation comprising 
component magnitudes of constituent frequency 
vectors that comprise said audio signal; 

(b) calculating a magnitude of each constituent 
frequency vector within said audio signal; 

(c) grouping each of said constituent frequency 
vectors into a number of frequency bands; 

(d) calculating an average magnitude of said 
constituent frequency vectors within each of said 
frequency bands; and 

(e) arranging said magnitudes into a learning 
representation . 

20. The method according to claim 19 wherein said 
frequency domain transform is a Fast Fourier 
Transform. 


21. The method according to claim 19 wherein an 
average magnitude of said constituent frequency 
vectors within each of said frequency bands further 
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comprises an aggregate average magnitude over a 
plurality of said time-sampled windows. 


22. The method according to claim 21 where said 
plurality of time-sampled windows comprises 12 time- 
sampled windows. 

23. The method according to claim 19 wherein no said 
frequency band includes any frequency greater than 11 
kHz. 

24 . The method according to claim 19 wherein said 
frequency bands grow in size according to the golden 
ratio of frequency with respect to pitch. 

25. The method according to claim 19 further 
comprising the step of converting said audio signal 
into a pulse code modulated bitstream for processing 
by said frequency domain transform. 

26. A computer readable storage medium, storing 
therein a program of instructions for causing a 


computer to execute process of representing an audio 
signal for machine learning, said process comprising 
the steps of : 

(a) creating a perceptual representation of said 
audio signal by performing a frequency domain 
transform on at least one time- sampled window of 
a digital representation of said audio signal, 
said perceptual representation comprising 
component magnitudes of constituent frequency 
vectors that comprise said audio signal; 

(b) calculating a magnitude of each constituent 
frequency vector within said audio signal; 

(c) grouping each of said constituent frequency 
vectors into a number of frequency bands; 

(d) calculating an average magnitude of said 
constituent frequency vectors within each of said 
frequency bands ; and 

(e) arranging said magnitudes into a learning 
representation . 
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27. An apparatus for classifying an audio data stream 
comprising: 

(a) a means for covering an audio data stream 
into a perceptual representation of its 
constituent frequencies ; 

(b) a means for dividing said perceptual 
representation into learning representations; and 

(c) a multi-stage classifying means trained to 
distinguish among classifying categories of said 
audio data stream, wherein said multi-stage 
classifying means outputs the classification of 
said audio signal. 

28. The apparatus according to claim 2 7 , wherein the 
said means for covering an audio data stream into a 
perceptual representation of its constituent 
frequencies comprises means to perform a Fast Fourier 
Transform function on at least one time- sampled window 
digital representation of said audio stream. 

29. The apparatus according to claim 27, wherein a 
means for dividing said perceptual representation into 


learning representations further comprises means for 
dividing said perceptual representation into a 
plurality of time slices. 

30. The apparatus according to claim 29, wherein each 
of said time slices is about 0.8 to about 1.2 seconds 
in length. 

31. The apparatus according to claim 27, wherein said 
means for dividing said perceptual representation into 
learning representations further comprises means for 
dividing said perceptual representation into a 
plurality of frequency bands. 

32. The apparatus according to claim 31, wherein said 
plurality of frequency bands comprises 20 frequency 
bands . 

33. The apparatus according to claim 31, wherein the 
size of each of said frequency bands grows according 
to the golden ratio of frequency with respect to 
pitch. 
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34. The apparatus according to claim 31, wherein no 
said frequency includes any frequency higher than 11 
kHz. 

35. The apparatus according to claim 27 , wherein a 
first stage of said multi-stage classifier comprises 
at least one Support Vector Machine. 

36. The apparatus according to claim 36, wherein said 
first stage of said multi-stage classifier comprises 
at least one Support Vector Machine per category of 
classification . 

37. The apparatus according to claim 27, wherein a 
final stage of said multi-stage classifier comprises a 
neural network. 

38. The apparatus according to claim 37, wherein said 
neural network comprises at least one input node per 
category of classification, and further wherein said 
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neural net comprises at least one output node per 
category of classification. 

39. The apparatus according to claim 38, wherein said 
neural network comprises a hidden layer, wherein said 
hidden layer comprises at least as many nodes as the 
number of said input nodes. 

40. The apparatus according to claim 37, wherein said 
neural network operates on a Gaussian activation 
function. 

41. The apparatus according to claim 27, wherein said 
classifying categories comprise at least one of artist 
and genre . 

42. The apparatus according to claim 27, further 
comprising a means to convert said audio signal into a 
pulse code modulated digital bitstream. 
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43. The apparatus according to claim 27, further 
comprising a means for measuring the confidence of 
said classification by said multi-stage classifier. 


44. An apparatus for representing an audio signal for 
machine learning comprising: 

(a) a means for performing a frequency domain 
transform on at least one time-sampled window of 
a digital representation of said audio signal, 
said perceptual representation comprising 
component magnitudes of constituent frequency 
vectors that comprise said audio signal; 

(b) a means for calculating a magnitude of each 
constituent frequency vector; 

(c) a means for grouping each of said 
constituent frequency vectors into a number of 
frequency bands; 

(d) a means for calculating an average magnitude 
of said constituent frequency vectors within each 
of said frequency bands; and 

(e) a means for arranging said magnitudes into a 
learning representation. 
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45. The apparatus according to claim 44 wherein said 
means for performing a frequency domain transform 
comprises a means for performing a Fast Fourier 
Transform* 

46. The apparatus according to claim 44 wherein no 
said frequency band includes any frequency greater 
than 11 kHz. 

4 7 ♦ The apparatus according to claim 44 wherein said 
frequency bands grow in size according to the golden 
ratio of frequency with respect to pitch. 

48. The apparatus according to claim 44 further 
comprising a means for converting said audio signal 
into a pulse code modulated bitstream for processing 
by said frequency domain transform. 
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